39 research outputs found
Reasoning-Driven Question-Answering For Natural Language Understanding
Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts:
In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions.
In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems.
In the final part, we present the first formal framework for multi-step reasoning algorithms,
in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field
GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution
Augmenting large language models (LLM) to use external tools enhances their
performance across a variety of tasks. However, prior works over-rely on
task-specific demonstration of tool use that limits their generalizability and
computational cost due to making many calls to large-scale LLMs. We introduce
GEAR, a computationally efficient query-tool grounding algorithm that is
generalizable to various tasks that require tool use while not relying on
task-specific demonstrations. GEAR achieves better efficiency by delegating
tool grounding and execution to small language models (SLM) and LLM,
respectively; while leveraging semantic and pattern-based evaluation at both
question and answer levels for generalizable tool grounding. We evaluate GEAR
on 14 datasets across 6 downstream tasks, demonstrating its strong
generalizability to novel tasks, tools and different SLMs. Despite offering
more efficiency, GEAR achieves higher precision in tool grounding compared to
prior strategies using LLM prompting, thus improving downstream accuracy at a
reduced computational cost. For example, we demonstrate that GEAR-augmented
GPT-J and GPT-3 outperform counterpart tool-augmented baselines because of
better tool use